首页> 外文OA文献 >Bilingual Distributed Word Representations from Document-Aligned Comparable Data

【2h】

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

机译：文档对齐的双语分布式Word表示可比数据

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a new model for learning bilingual word representations fromnon-parallel document-aligned data. Following the recent advances in wordrepresentation learning, our model learns dense real-valued word vectors, thatis, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs whichheavily relied on parallel sentence-aligned corpora and/or readily availabletranslation resources such as dictionaries, the article reveals that BWEs maybe learned solely on the basis of document-aligned comparable data without anyadditional lexical resources nor syntactic information. We present a comparisonof our approach with previous state-of-the-art models for learning bilingualword representations from comparable data that rely on the framework ofmultilingual probabilistic topic modeling (MuPTM), as well as withdistributional local context-counting models. We demonstrate the utility of theinduced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2)suggesting word translations in context for polysemous words. Our simple yeteffective BWE-based models significantly outperform the MuPTM-based andcontext-counting representation models from comparable data as well as priorBWE-based models, and acquire the best reported results on both tasks for allthree tested language pairs.

机译：我们提出了一种新的模型，用于从非并行文档对齐数据中学习双语单词表示。继单词表示学习的最新进展之后，我们的模型学习了密集的实值单词向量，即双语单词嵌入（BWE）。与以前的诱导BWE的工作不同，BWE的工作主要依赖于平行的句子对齐的语料库和/或易于获得的翻译资源（例如词典），本文揭示了BWE可能仅基于文档对齐的可比数据进行学习，而没有任何其他的词汇资源或语法信息。我们将我们的方法与以前的先进模型进行比较，该模型用于从可比较数据中学习双语单词表示，该可比较数据依赖于多语言概率主题建模（MuPTM）框架以及分布式本地上下文计数模型。我们证明了诱导的BWE在两个语义任务中的效用：（1）双语词典提取；（2）建议在上下文中针对多义词进行词翻译。我们简单而有效的基于BWE的模型在可比数据和基于先前BWE的模型上均大大优于基于MuPTM和上下文计数的表示模型，并且在所有三种测试语言对的两项任务中均获得了最佳的报告结果。

著录项

作者
Vulić, Ivan; Moens, Marie-Francine;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Bilingual Distributed Word Representations from Document-Aligned Comparable Data [J] . Moens Marie-Francine, Vuli#263, Ivan The Journal of Artificial Intelligence Research . 2016,第10期

机译：与文档对齐的可比数据的双语分布式单词表示形式
2. Bilingual Distributed Word Representations from Document-Aligned Comparable Data [J] . Vulic Ivan, Moens Marie-Francine The Journal of Artificial Intelligence Research . 2016,第Null期

机译：与文档对齐的可比数据的双语分布式单词表示形式
3. EEG decoding of spoken words in bilingual listeners: from words to language invariant semantic-conceptual representations [J] . Jo?￡o M. Correia, Bernadette Jansma, Lars Hausfeld, Frontiers in Psychology . 2015,第4期

机译：双语听众中口语单词的EEG解码：从单词到语言不变的语义概念表示
4. Bilingual Lexicon Extraction with Temporal Distributed Word Representation from Comparable Corpora [C] . Chunyue Zhang, Tiejun Zhao Natural language processing and Chinese computing . 2015

机译：来自可比语料库的具有时间分布式词表示的双语词典提取
5. The representation of multiple translations in bilingual memory: An examination of lexical organization for concrete, abstract, and emotion words in Spanish-English bilinguals. [D] . Basnight-Brown, Dana M. 2009

机译：双语记忆中多种翻译的表示：西班牙语-英语双语者中具体，抽象和情感词的词汇组织检查。
6. EEG decoding of spoken words in bilingual listeners: from words to language invariant semantic-conceptual representations [O] . João M. Correia, Bernadette Jansma, Lars Hausfeld, -1

机译：双语听众中口语的EEG解码：从单词到语言不变的语义概念表示
7. Bilingual distributed word representations from document-aligned comparable data [O] . Vulić Ivan, Moens Marie-Francine 2016

机译：来自与文档对齐的可比数据的双语分布式单词表示

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅